allow to append comment to query #709

dkuku · 2024-10-11T21:09:17Z

Recently there was introduced commit_comment to prepend comment before the commit keyword
I wanted to propose something similar for all queries - It will be useful for prepending sqlcommenter comments to queries but any other info is possible.
I tried different hacks to get around this limitation but this has to be implemented in postgrex just before the query is sent to the database.
I found the place where to add this code by trial and error, any suggestion welcome how to make it better.
Also is it possible to test it somehow? Currently I'm logging it in the database but it would be nice to log the data that is sent to the database. Maybe testing the encoder is enough?

2024-10-11 21:44:02.715 BST,"kuku","postgrex_test",624085,"127.0.0.1:55774",67098e12.985d5,1,"SELECT",2024-10-11 21:44:02 BST,3/2286,0,LOG,00000,"execute <unnamed>: /*comment*/
SELECT 321",,,,,,,,,"","client backend",,0

josevalim · 2024-10-12T07:13:16Z

Unfortunately this is not going to work based on how most users use Ecto+Postgres because the queries are prepared: which means the query is only sent once to the database, the first time around. The reason why it works with BEGIN is because it is a textual command that is small and it is sent every time.

dkuku · 2024-10-12T07:35:36Z

Thanks, I did not know that. What about adding a mode that allows to send the whole query when there is a comment? Libraries in other languages send the whole sql afaik - python example, there is more in other languages in this repo.

josevalim · 2024-10-12T07:38:20Z

We can do that and say that passing a comment automatically disables query caching, but then I beleive the implementation here has to be different. I would generally try to deal with it at the protocol level though, instead of changing the binary payloads.

dkuku · 2024-10-12T07:42:34Z

I'm happy to work on that, I just need to better understand how postgrex works.

josevalim · 2024-10-12T07:51:15Z

You probably want to extract out the comment only in this branch: https://github.com/elixir-ecto/postgrex/pull/709/files#diff-0da854f0c1cda9486d776c72ecda6a2e595a7667b72688669bbd80d6b80f0f96R354 and pass the comment as argument to parse_describe_flush after the query. You can adjust other callers to pass nil as the comment.

And, on Postgrex module, if there is a comment, you set postgrex_prepare to false.

dkuku · 2024-10-12T09:46:03Z

I'm considering whether it should be named differently, like non_cacheable_comment. That way, we could also introduce a cacheable_comment in the future, which could include static values like the module and function name. This would offer a good compromise by making it easier to locate where the query is defined (since we have access to that information using Ecto macros, then it could be added in ecto_sql).
Idea from here:
https://www.adyen.com/knowledge-hub/injecting-metadata-as-comments

dkuku · 2024-10-12T10:13:29Z

test/query_test.exs

+    end
+  end
+
+  @tag :big_binary


Makes problems when I log all queries in postgres - this inserts all the data in the log file, I need to have a way to skip it.

greg-rychlewski · 2024-10-13T12:18:42Z

Sorry if i'm way off the mark, I don't know if I completely understand the ask. But would it work to add prepare: :unnamed to the repo operations instead? That way Postgres is not caching the query. We allow it to be specified per operation so you can pick where it's used.

dkuku · 2024-10-13T12:29:41Z

When we have options to skip query caching would it be enough to add the comment in the query function where we call iodata_to_binary? instead of modifying the protocol file ?

lib/postgrex.ex

lib/postgrex/protocol.ex

josevalim · 2024-10-13T14:19:04Z

I have dropped some nitpicks. It generally looks good to me :)

dkuku · 2024-10-14T06:28:26Z

After thinking abount it I wonder if this is exactly what I need.
My initial idea was to append the comment after the query cache - I was under the impression that it can be done in postgrex but I was wrong.
If the query can't be cached then it may be a better idea to append it in the query builder and from there mark the query as non cacheable in case the comment is dynamic or leave it as is when the comment is static (caller module and function name may be enough for most use cases).
My concern is that disabling the query cache can increase database cpu load - it may not be a problem in small projects but in bigger it may be a concern. Datadog library for example does trace sampling - maybe there is a way to know if a trace will be sampled and only include the dynamic parts on demand otherwise allow just to tag query with the basic info and have caching working.
I can prepare an rfc how I imagine it and get feedback on ecto mailing list and the observability group.

josevalim · 2024-10-14T07:04:47Z

If the query can't be cached then it may be a better idea to append it in the query builder and from there mark the query as non cacheable in case the comment is dynamic or leave it as is when the comment is static (caller module and function name may be enough for most use cases).

Given queries are composable, which module and function would you use? If you want to use the module and function from the repository, then it is somewhat dynamic, but it is already information that we compute anyway, so you are right we could tackle it at the builder.

josevalim · 2024-10-14T07:05:49Z

lib/postgrex/protocol.ex

@@ -1593,6 +1606,16 @@ defmodule Postgrex.Protocol do
    transaction_error(s, postgres)
  end

+  defp parse_describe_comment_msgs(query, comment, tail) when is_binary(comment) do
+    statement = query.statement <> "/*#{comment}*/"


It seems we may no longer go wit hthis PR, but I wonder if this could be dealt as IO lists:

Suggested change

statement = query.statement <> "/*#{comment}*/"

statement = [query.statement, "/*", comment, "*/"]

dkuku · 2024-10-22T18:47:43Z

So how do we proceed with that pr?
I think it would be beneficial to still have a way to force the prepare
Repo.all(query, comment: "repo:MyApp.Repo,app:Myapp.Web", prepare: true)

josevalim · 2024-10-23T05:51:37Z

lib/postgrex/protocol.ex

+      false ->
+        comment = Keyword.get(opts, :comment)
+
+        if is_binary(comment) && String.contains?(comment, "*/") do


Suggested change

if is_binary(comment) && String.contains?(comment, "*/") do

if is_binary(comment) and String.contains?(comment, "*/") do

Let's move this validation to the client. There is no need to crash the connection on bad argument. :)

So should I just remove the whole if here?

You should move it to the client, to the Postgrex module.

I added it there in the query function.
In theory the comment could be also an iolist but Currently we are getting CaseClauseError.
I think this is hard to validate for sql injection unless we first dump it to a string.

josevalim

I added one tiny nitpick and we can ship it!

josevalim · 2024-10-24T07:47:29Z

💚 💙 💜 💛 ❤️

josevalim · 2024-10-24T07:50:55Z

Thank you for leading this. I am sure people will appreciate this feature. Please give main a try in conjuction with Ecto and let us know if it works as expected.

dkuku · 2024-10-24T07:52:50Z

Thanks, I will try it over the weekend.

dkuku · 2024-10-27T07:46:27Z

We missed something when skipping prepare - I need to dig deeper into it, but that's what I currently have:
I created a repo module that has every function replaced with a macro.
Now with such module livebook:

defmodule W do
  require Test.Repo
  alias Test.Repo
  import Ecto.Query

  def insert do
    weather1 = %Weather{temp_lo: 0, temp_hi: 23}
    Repo.insert!(weather1)
  end

  def get_query do
    values = [%{id: 1, name: "Zabrze"}, %{id: 2, name: "Dudley"}]
    Weather
    |> join(:inner, [w], w1 in values(values, %{id: :integer, name: :string}), on: w.id == w1.id)
    |> select([w, c], %Weather{id: w.id, city: c.name})
    |> Repo.all()
  end

  def get_schema do
    Weather
    |> select([w], %Weather{id: w.id})
    |> where([w], not is_nil(w.id))
    |> Repo.all()
  end
end

And calling it one after one, I'm getting the comment only on one of the queries:

2024-10-27 07:24:07.266 GMT,"postgres","postgres",192987,"127.0.0.1:43910",671dea97.2f1db,1,"INSERT",2024-10-27 07:24:07 GMT,8/7,0,LOG,00000,"execute ecto_insert_weather_0: INSERT INTO ""weather"" (""temp_lo"",""temp_hi"",""prcp"",""inserted_at"",""updated_at"") VALUES ($1,$2,$3,$4,$5) RETURNING ""id""","parameters: $1 = '2', $2 = '23', $3 = '0', $4 = '2024-10-27 07:24:07', $5 = '2024-10-27 07:24:07'",,,,,,,,"","client backend",,0
2024-10-27 07:24:07.275 GMT,"postgres","postgres",192984,"127.0.0.1:43882",671dea97.2f1d8,1,"SELECT",2024-10-27 07:24:07 GMT,9/7,0,LOG,00000,"execute <unnamed>: SELECT w0.""id"", v1.""name"" FROM ""weather"" AS w0 INNER JOIN (VALUES ($1::bigint,$2::varchar),($3::bigint,$4::varchar)) AS v1 (""id"",""name"") ON w0.""id"" = v1.""id""/*app='test_app',function='get_query%2F0',line='18',module='W',owner='team_c'*/","parameters: $1 = '1', $2 = 'Zabrze', $3 = '2', $4 = 'Dudley'",,,,,,,,"","client backend",,0
2024-10-27 07:24:07.278 GMT,"postgres","postgres",192981,"127.0.0.1:43868",671dea97.2f1d5,1,"SELECT",2024-10-27 07:24:07 GMT,4/6,0,LOG,00000,"execute ecto_7179: SELECT w0.""id"" FROM ""weather"" AS w0 WHERE (NOT (w0.""id"" IS NULL))",,,,,,,,,"","client backend",,0

as you see only one is unnamed.
When I add the option prepare: :unnamed to default_options then the second select is unnamed but the insert still has a name.

josevalim · 2024-10-27T08:23:53Z

@dkuku in Ecto, we pass a cache_statement to Postgrex.query, and I think that's overriding the prepare: false. Around here: https://github.com/elixir-ecto/postgrex/blob/master/lib/postgrex.ex#L303

We should probably ignore the cache statement if a comment is given.

Also, you shouldn't need to replace all Ecto.Repo functions. We already support stacktrace: true as part of the Repo configuration. If you set it to true, you can override def prepare_options (or a similar callback) to read the :stacktrace option and use it to build the comment option. :)

dkuku · 2024-10-27T18:36:19Z

This is getting a bit complicated, I have to pass the comment around too many function and still it is not shown everywhere.
I think much easier would be to append it to the statement asap and just properly validate that it is not prepared - for example by setting prepare: :unnamed?

I have also a question about the stacktrace - do you have any suggestion:
When i use Ecto.Repo the stacktrace that I see when inspecting in prepare_query has not much info about the code I'm running. There is only the line where use Ecto.Repo - nothing about my code.

{:all,
 #Ecto.Query<from w0 in Weather, where: not is_nil(w0.id),
  select: %Weather{id: w0.id}>,
 [
   stacktrace: [
     {Ecto.Repo.Supervisor, :tuplet, 2, [file: ~c"lib/ecto/repo/supervisor.ex", line: 179]},
     {Test.Repo, :all, 2, [file: ~c"livemd/ecto.livemd#cell:pwz2dlcaejzj2duk", line: 2]},
     {:elixir, :eval_external_handler, 3, [file: ~c"src/elixir.erl", line: 386]},
     {:erl_eval, :do_apply, 7, [file: ~c"erl_eval.erl", line: 904]},
     {:elixir, :eval_forms, 4, [file: ~c"src/elixir.erl", line: 364]},
     {Module.ParallelChecker, :verify, 1, [file: ~c"lib/module/parallel_checker.ex", line: 112]},
     {Livebook.Runtime.Evaluator, :"-eval/4-fun-0-", 3, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 635]},
     {Code, :with_diagnostics, 2, [file: ~c"lib/code.ex", line: 621]},
     {Livebook.Runtime.Evaluator, :eval, 4, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 629]},
     {Livebook.Runtime.Evaluator, :continue_do_evaluate_code, 6, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 437]},
     {Livebook.Runtime.Evaluator, :loop, 1, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 337]},
     {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}
   ]
 ]}

I gave up and went back to the module I made. Then the stacktrace looks like i expect - the module is on the second line. I tested it like 20 times switching between my implementation and Ecto.

{:all,
 #Ecto.Query<from w0 in Weather, where: not is_nil(w0.id),
  select: %Weather{id: w0.id}>,
 [
   stacktrace: [
     {Ecto.Repo.Supervisor, :tuplet, 2, [file: ~c"lib/ecto/repo/supervisor.ex", line: 179]},
     {WeatherContext, :get_schema, 0, [file: ~c"livemd/ecto.livemd#cell:4h52ky3654zcawdu", line: 32]},
     {:elixir, :eval_external_handler, 3, [file: ~c"src/elixir.erl", line: 386]},
     {:erl_eval, :do_apply, 7, [file: ~c"erl_eval.erl", line: 904]},
     {:elixir, :eval_forms, 4, [file: ~c"src/elixir.erl", line: 364]},
     {Module.ParallelChecker, :verify, 1, [file: ~c"lib/module/parallel_checker.ex", line: 112]},
     {Livebook.Runtime.Evaluator, :"-eval/4-fun-0-", 3, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 635]},
     {Code, :with_diagnostics, 2, [file: ~c"lib/code.ex", line: 621]},
     {Livebook.Runtime.Evaluator, :eval, 4, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 629]},
     {Livebook.Runtime.Evaluator, :continue_do_evaluate_code, 6, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 437]},
     {Livebook.Runtime.Evaluator, :loop, 1, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 337]},
     {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}
   ]

The code looks like this:

josevalim · 2024-10-27T18:52:39Z

I mentioned this in my previous comment:

Also, you shouldn't need to replace all Ecto.Repo functions. We already support stacktrace: true as part of the Repo configuration. If you set it to true, you can override def prepare_options (or a similar callback) to read the :stacktrace option and use it to build the comment option. :)

It doesn't work? That should alleviate the need to pass comments. You should only need to do it on prepare_query/prepare_options/etc.

I have also a question about the stacktrace - do you have any suggestion

Yes, don't use the shell/livebook. Those are evaluated. Stacktrace is precise on compiled code.

implement comment in postgrex

be74cce

dkuku force-pushed the allow_to_prepend_query_with_comment branch from 5aee3aa to be74cce Compare October 12, 2024 09:33

move at the end of query

409d07d

dkuku commented Oct 12, 2024

View reviewed changes

set postgrex_prepare in query

a5121fb